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Oh! Abstract In this paper we study the two player randomized communication com- 

■^T ' plcxity of the sparse set disjointness and the exists-equal problems and give match- 

ing lower and upper bounds (up to constant factors) for any number of rounds for 
both of these problems. In the sparse set disjointness problem, each player receives 
a fc-subset of [m] and the goal is to determine whether the sets intersect. For this 
>*' ' problem, we give a protocol that communicates a total of 0{k\og^^' k) bits over . 

S««/ , rounds and errs with very small probability. Here we can take r = log* k to obtain 

C/3 ' a 0{k) total communication log* fc-round protocol with exponentially small error 

, ^, , probability, improving on the 0(fc)-bits 0(logfc)-round constant error probability 

protocol of Hastad and Wigderson from 1997. 

In the exist-equal problem, the players receive vectors x,y G [i]" and the goal 
^ ' is to determine whether there exists a coordinate i such that Xi = yi- Namely, the 

exists-equal problem is the OR of n equality problems. Observe that exists-equal is 
an instance of sparse set disjointness with k = n, hence the protocol above applies 
here as well, giving an 0(nlog^'^^ n) upper bound. Our main technical contribution 
in this paper is a matching lower bound: we show that when t = r2(n), any r-round 
f~^ \ randomized protocol for the exists-equal problem with error probability at most 

ff^ ' 1/3 should have a message of size Vt{n\og'^' n). Our lower bound holds even for 

super-constant r < log* n, showing that any 0{n) bits exists-equal protocol should 
have log* n — 0(1) rounds. Note that the protocol we give errs only with less than 
polynomially small probability and provides guarantees on the total communication 
for the harder set disjointness problem, whereas our lower bound holds even for 
jrt ! constant error probability protocols and for the easier exists-equal problem with 

guarantees on the max-communication. Hence our upper and lower bounds match 
in a strong sense. 

Our lower bound on the constant round protocols for exists-equal show that 
solving the OR of n instances of the equality problems requires strictly more than 
n times the cost of a single instance. To our knowledge this is the first example of 
such a super-linear increase in complexity. 



1 Introduction 

In a two player communication problem the players, named Alice and Bob, receive 
separate inputs, x and y, and they communicate in order to compute the value f{x,y) 
of a function /. In an r-round protocol, the players can take at most r turns alternately 
sending each other a message and the last player to receive a message declares the 
output of the protocol. A protocol can be deterministic or randomized, in the latter 
case the players can base their actions on a common random source and we measure the 
error probability: the maximum over inputs {x,y), of the probability that the output of 
the protocol differs from f{x,y). 

1.1 Sparse set disjointness 

Set disjointness is perhaps the most studied problem in communication complexity. In 
the most standard version Alice and Bob receive a subset of [m] := {1, . . . ,m} each, 
with the goal of deciding whether their sets intersect or not. The primary question is 
whether the players can improve on the trivial deterministic protocol, where the first 
player sends the entire input to the other player, thereby communicating m bits. The 
first lower bound on the randomized complexity of this problem was given in [2] by 
Babai et al., who showed that any e-error protocol for disjointness must communicate 
Q{^/rn) bits. The tight bound of Q{m)-hits was first given by Kalyanasundaram and 
Schnitger [28] and was later simplified by Razborov [42] and Bar-Yossef et al. [3]. 

In the sparse set disjointness problem DISJ™, the sets given to the players are 
guaranteed to have at most k elements. The deterministic communication complexity 
of this problem is well understood. The trivial protocol, where Alice sends her entire 
input to Bob solves the problem in one round using 0{k\og{2n/k)) bits. On the other 
hand, an n{klog{2n/k)) bit total communication lower bound can be shown even for 
protocols with an arbitrary number of rounds, say using the rank method; see [31], page 
175. 

The randomized complexity of the problem is far more subtle. The results cited 
above immediately imply a il.{k) lower bound for this version of the problem. The 
folklore 1-round protocol solves the problem using 0{klogk) bits, wherein Alice sends 
0(logA;)-bit hashes for each element of her set. Hastad and Widgerson [23] gave a 
protocol that matches the Q{k) lower bound mentioned above. Their 0(A:)-bit ran- 
domized protocol runs in 0(log/c)-rounds and errs with a small constant probability. 
In Section 2, we improve this protocol to run in log* k rounds, still with 0{k) total 
communication, but with exponentially small error in k. We also present a r-round 
protocol for any r < log* k with total communication 0{klog^^' k) and error probability 
well below \/k; see Theorem 1. (Here log'^^ denotes the iterated logarithm function, 
see Section 1.5.) As the exists-equal problem with parameters t and n (see below) is a 
special case of DISJ^*^, our lower bounds for the exists-equal problem (see below) show 
that complexity of this algorithm is optimal for any number r < log* k of rounds, even 
if we allow much the larger error probability of 1/3. Buhrman et al. [12] and Woodruff 
[45] (as presented in [40]) show an Vl[k\ogk) lower bound for 1-round complexity of 
DISJ™ by a reduction from the indexing problem (a similar reduction was also given in 



[16]). We note that these lower bounds do not apply to the exists-equal problem, as the 
input distribution they use generates instances inherently specific to the disjointness 
problem; furthermore this distribution admits a 0(logA;) protocol in two rounds. 

1.2 The exists-equal problem 

In the equality problem Alice and Bob receive elements x and y of a universe [t] and they 
have to decide whether x = y. We define the two player communication game exists- 
equal with parameters t and n as follows. Each player is given an n-dimensional vector 
from [t]", namely x and y. The value of the game is one if there exists a coordinate i E [n] 
such that Xi = yi, zero otherwise. Clearly, this problem is the OR of n independent 
instances of the equality problem. 

The direct sum problem in communication complexity is the study of whether n 
instances of a problem can be solved using less than n times the communication required 
for a single instance of the problem. This question has been studied extensively for 
specific communication problems as well as some class of problems [13, 25, 26, 6, 18, 24, 
21, 4]. The so called direct sum approach is a very powerful tool to show lower bounds 
for communication games. In this approach, one expresses the problem at hand, say 
as the OR of n instances of a simpler function and the lower bound is obtained by 
combining a lower bound for the simpler problem with a direct sum argument. For 
instance, the two-player and multi-player disjointness bounds of [3], the lopsided set 
disjointness bounds [41], and the lower bounds for several communication problems 
that arise from streaming algorithms [27, 33] are a few examples of results that follow 
this approach. 

Exists-equal with parameters t and n is a special case of DIS J^" , so our protocols in 
Section 2 solve exists-equal. We show that when t = Q.[n) these protocols are optimal, 
namely every r-round randomized protocol (r < log* n) with at most 1/3 error error 
probability needs to send at least one message of size Q(nlog^^^ n) bits. See Theorem 4. 
Our result shows that computing the OR of n instances of the equality problem requires 
strictly more than n times the communication required to solve a single instance of the 
equality problem when the number of rounds is smaller than log* n — 0(1). Recall that 
the equality problem admits an e-error log(l/e)-bit one-round protocol in the common 
random source model. 

For r = 1, our result implies that to compute the OR of n instances of the equality 
problem with constant probability, no protocol can do better than solving each instance 
of the equality problem with high probability so that the union bound can be applied 
when taking the OR of the computed results. The single round case of our lower bound 
also generalizes the r2(n log n) lower bound of Molinaro et al. [36] for the one round 
communication problem, where the players have to find all the answers of n equality 
problems, outputting an n bit string. 

1.3 Lower bound techniques 

We obtain our general lower bound via a round elimination argument. In such an argu- 
ment one assumes the existence of a protocol P that solves a communication problem. 



say /, in r rounds. By suitably modifying the internals of P, one obtains another proto- 
col P' with r—1 rounds, which typically solves smaller instances of / or has larger error 
than P. Iterating this process, one obtains a protocol with zero rounds. If the protocol 
we obtain solves non-trivial instances of / with good probability, we conclude that we 
have arrived at a contradiction, therefore the protocol we started with, P, cannot ex- 
ist. Although round elimination arguments have been used for a long time, our round 
elimination lemma is the first to prove a super-linear communication lower bound in the 
number of primitive problems involved, obtaining which requires new and interesting 
ideas. 

The general round elimination presented in Section 5 is very involved, but the lower 
bound on the one-round protocols can also be obtained in a more elementary way. As the 
one round case exhibits the most dramatic super-linear increase in the communication 
cost and also generalizes the lower bound in [36], we include this combinatorial argument 
separately in Section 3, see Theorem 2. 

At the heart of the general round elimination lemma is a new isoperimetric inequality 
on the discrete cube [t]" endowed with the Hamming distance. We present this result. 
Theorem 3, in Section 4. To the best of our knowledge, the first isoperimetric inequality 
on this metric space was proven by Lindsey in [32], where the subsets of [t]" of a 
certain size with the so called minimum induced-edge number were characterized. This 
result was rediscovered in [30] and [15] as well. See [1] for a generalization of this 
inequality to universes which are n-dimensional boxes with arbitrary side lengths. In 
[8], Bollobas et al. study isoperimetric inequalities on [t]" endowed with the £i distance. 
For the purposes of our proof we need to find sets S that minimize a substantially 
more complicated measure. This measure also captures how spread out S is and can be 
described roughly as the average over points x G [t]" of the logarithm of the number of 
points in the intersection of S and a Hamming ball around x. 

1.4 Related work 

In [35], a round elimination lemma was given, which applies to a class of problems 
with certain self-reducibility properties. The lemma is then is used to get lower bounds 
for various problems including the greater-than and the predecessor problems. This 
result was later tightened in [44] to get better bounds for the aforementioned problems. 
Different round elimination arguments were also used in [29, 19, 38, 34, 17, 5] for various 
communication complexity lower bounds and most recently in [9] and [11] for obtaining 
lower bounds for the gapped Hamming distance problem. 

Independent of and in parallel of the present form of this paper Brody et al. [10] have 
also established an Q{nlog^^' n) lower bound for the r-round communication complexity 
of the exists-equal problem with parameter n. Their result applies for protocols with a 
polynomially small error probability like 1/n. This stronger assumption on the protocol 
allows for simpler proof techniques, namely the information complexity based direct 
sum technique developed in several papers including [13], but it is not enough to create 
an example where solving the OR of n communication problems requires more than 
n times the communication of solving a single instance. Indeed, even in the shared 
random source model one needs log n bits of communication (independent of the number 



of rounds) to achieve 1/n error in a single equality problem. 

1.5 Notation 

For a positive integer t, we write [t] for the set of positive integers not exceeding t. For 
two n-dimensional vectors x, y, let Match(a;,y) be the number of coordinates where x 
and y agree. Notice that n — Match(2;, y) is the Hamming distance between x and y. 
For a vector x G [t]" we write Xi for its ith coordinate. We denote the distribution of a 
random variable X by dist(X) and the support set of it by supp(X). We write Pr2,v^j^[-] 
and Ej,.~i/[-] for the probability and expectation, respectively, when x is distributed 
according to a distribution v. We write fi for the uniform distribution on [t]". For 
instance, for a set S" C [t]", we have /x(5) = \S\/t^. 

For x,y G [t]" we denote the value of the exists-equal game by EE^(x,y). Recall 
that it is zero if and only if x and y differ in each coordinate. Whenever we drop t from 
the notation we assume t = An. Often we will also drop n and simply denote the game 
value by EE(x, y) if n is clear from the context. 

All logarithms in this paper are to the base 2. Analogously, throughout this paper 
we take exp(x) = 2^. We will also use the iterated versions of these functions: 

log( > X ■■= X, exp*^ ' X ■■= X, 

log*^''^ X := log(log''"~ ' x), exp*^''^ X •■= exp(exp'^~ ' x) for r > 1. 

Moreover we define log* x to be the smallest integer r for which log*^^' x < 2. 

Throughout the paper we ignore divisibility problems, e.g., in Lemma 2 in Section 3 
we assume that t"/2^"'"^ is an integer. Dealing with rounding issues would complicate 
the presentation but does not add to the complexity of the proofs. 

1.6 Information theory 

Here we briefly review some definitions and facts from information theory that we use 
in this paper. For a random variable X, we denote its binary Shannon entropy by 
H(X). We will also use conditional entropies H(X | Y) = H(X, 1") — H(y). Let /x and 
u be two probability distributions, supported on the same set S. We denote the binary 
Kullback-Leibler divergence between fj, and i^ by D(^ || i/). A random variable with 
Bernoulli distribution with parameter p takes the value 1 with probability p and the 
value with probability 1 — p. The entropy of this variable is denoted by H2(p). For 
two reals p,q £ (0,1), we denote by D2(p||g) the divergence between the Bernoulli 
distributions with parameters p and q. 

If X G [t]" and L Q [n], then the projection of X to the coordinates in L is denoted 
by Xl. Namely, Xl is obtained from X = (Xi, . . . , X„) by keeping only the coordinates 
Xi with i £ L. The following lemma of Chung et al. [14] relates the entropy of a variable 
to the entropy of its projections. 

Lemma 1. (Chung et al. [14]) Let supp(A:) C [t]''. We have :^H(X) < EL[H(Xi)], 
where the expectation is taken for a uniform random l-subset L of [n]. 



1.7 Structure of the paper 

We start in Section 2 with our protocols for the sparse set disjointness. Note that the 
exists-equal problem is a special case of sparse set disjointness, so our protocols work 
also for the exists-equal problem. In the rest of the paper we establish matching lower 
bounds showing that the complexity of our protocols are within a constant factor to 
optimal for both the exists-equal and the sparse set disjointness problems, and for any 
number of rounds. In Section 3 we give an elementary proof for the case of single round 
protocols. In Section 4 we develop our isoperimetric inequality and in Section 5 we use 
it in our round elimination proof to get the lower bound for multiple round protocols. 
Finally in Section 6 we point toward possible extensions of our results. 

2 The upper bound 

Recall that in the communication problem DISJ™, each of the two players is given a 
subset of [m] of size at most k and they communicate in order to determine whether 
their sets are disjoint or not. In 1997, Hastad and Wigderson [39, 23] gave a probabilistic 
protocol that solves this problem with 0{k) bits of communication and has constant 
one-sided error probability. The protocol takes 0(logA;) rounds. Let us briefly review 
this protocol as this is the starting point of our protocol. 

Let S,T CI [m] be the inputs of Alice and Bob. Observe that if they find a set 
Z satisfying S Q Z C [m] , then Bob can replace his input T with T' = T D Z as 
T' nS = TnS. The main observation is that if S and T are disjoint, then a random set 
Z ^ S will intersect T in a uniform random subset, so one can expect \T'\ ~ |T|/2. In the 
Hastad- Wigderson protocol the players alternate in finding a random set that contains 
the current input of one of them, effectively halving the other player's input. If in this 
process the input of one of the players becomes empty, they know the original inputs 
were disjoint. If, however, the sizes of their inputs do not show the expected exponential 
decrease in time, then they declare that their inputs intersect. This introduces a small 
one sided error. Note that one of the two outcomes happens in 0(log k) rounds. An 
important observation is that Alice can describe a random set Z ^ S to Bob using an 
expected OdSj) bits by making use of the joint random source. This makes the total 
communication 0{k). 

In our protocol proving the next theorem, we do almost the same, but we choose the 
random sets Z D S not uniformly, but from a biased distribution favoring ever smaller 
sets. This makes the size of the input sets of the players decrease much more rapidly, 
but describing the random set Z to the other player becomes more costly. By carefully 
balancing the parameters we optimize for the total communication given any number of 
rounds. When the number of rounds reaches log* k — 0(1) the communication reaches 
its minimum of 0{k) and the error becomes exponentially small. 

Theorem 1. For any r < log* k, there is an r-round probabilistic protocol for DISJ^ 
with 0{k log^*"^ k) bits total communication. There is no error for intersecting input 
sets, and the probability of error for disjoint sets can be made 0(l/exp(^)(clog''^^ k) + 
exp(— vA?)) ^ ^/k for any constant c > 1. 



For r = log* k — 0(1) rounds this means an 0{k)-bit protocol with error probability 
0(exp(-^/^)). 

Proof. We start with the description of the protocol. Let 5o and 5i be the input sets 
of Alice and Bob, respectively. For 1 < i < r, i even Alice sends a message describing 
a set Zi D Si based on her "current input" Si and Bob updates his "current input" 
Si-i to S'j+i := Si-i n Zi. In odd numbered rounds the same happens with the role 
of Alice and Bob reversed. We depart from the Hastad-Wigderson protocol in the way 
we choose the sets Zi: Using the shared random source the players generate li random 
subsets of [m] containing each element of [m] independently and with probability pi . We 
will set these parameters later. The set Zi is chosen to be the first such set containing 
Si. Alice or Bob (depending on the parity of i) sends the index of this set or ends the 
protocol by sending a special error signal if none of the generated sets contain Si. The 
protocol ends with declaring the inputs disjoint if the error signal is never sent and we 
have Sr+i = 0. In all other cases the protocol ends with declaring "not disjoint". 

This finishes the description of the protocol except for the setting of the parameters. 
Note that the error of the protocol is one-sided: Sq Ci Si = Si Ci S'i+i for i < r, so 
intersecting inputs cannot yield Sr+i = 0. 

We set the parameters (including fcj used in the analysis) as follows: 

u= (c + l) log('') k, 

Pi = 7T7— for 1 < i < r, 

exp*.''' u 

li = kexp{ku), 

I = k2^/^'"' for 2 < i < r, 



ko = ki = k, 

k 

2*-4 exp('-i) u 



k 
ki = —. — : T^—Ts — for 2 < i < r, 



kr+i = 0. 

The message sent in round i > 1 has length [log(/j-|-l)] < A;/2*~^-|-log /c + l, thus the 
total communication in all rounds but the first is 0{k). The length of the first message 
is [log(/i + 1)] < ku + log A; + 1. The total communication is 0{ku) = 0{ck\og^^' k) as 
claimed (recall that c is a constant). 

Let us assume the input pair is disjoint. To estimate the error probability we call 
round i bad if an error message is sent or a set S'j+i is created with |5i+i| > fcj+i. If no 
bad round exists we have S^+i = and the protocol makes no error. In what follows 
we bound the probability that round i is bad assuming the previous rounds are not bad 
and therefore having \Sj\ < kj for < j < i. 

The probability that a random set constructed in round i contains Si is p^ —Pi ' • 
The probability that none of the /j sets contains Si and thus an error message is sent is 
therefore at most (1 —p^^'Y'- < e~^. 

If no error occurs in the first bad round i, then liSj+il > fej+i. Note that in this case 
Si+i = Si-i n Zi contains each element of 5j_i independently and with probability pi. 

7 



This is because the choice of Zi was based on it containing S'j, so it was independent 
of its intersection with Si-\ (recall that Si n Si-\ = 5*1 n 5*0 = 0). For i < r we use 
the Chernoff bound. The expected size of S'j+i is \Si-i\pi < hi-ipi < A:j+i/2, thus the 
probability of |5'j+i| > fcj+i is at most 2~^+i'^. Finally for the last round i = r we use 
the simpler estimate PrK-i < fe/exp^'') u for |S'r+i| > K+i = 0. 

Summing over all these estimates we obtain the following error bound for our pro- 
tocol: 

k ^ 

Pr [error] < re~'' -\ ^^ h T^ 2"^'/^. 

ex-p(^> u ^—i 

1=2 

In case kr > '^\/n this error estimate proves the theorem. In case kr < A\/k we need 
to make a minor adjustments in the setting of our parameters. We take j to be the 
smallest value with kj < 4vfc, modify the parameters for round j and stop the protocol 
after this round declaring "disjoint" if Sjj^i = and "intersecting" otherwise. The new 
parameters for round j are A;' = 4\/A;, p'- = 2^"^^^ , I'- = k2^^ . This new setting of the 
parameters makes the message in the last round linear in fc, while both the probability 
that round j — 1 is bad because it makes \Sj\ > k'-, or the probability that round j is 

bad for any reason (error message or S'j+i 7^ 0) is 0(2^^ ). This finishes the analysis 
of our protocol. D 

3 Lower bound for single round protocols 

In this section we give an combinatorial proof that any single round randomized protocol 
for the exists-equal problem with parameters n and t = 4n has complexity Q(nlogn) if 
its error probability is at most 1/3. As pointed out in the Introduction, to our knowledge 
this is the fist established case when solving the OR of n instances of a communication 
problem requires strictly more than n times the complexity needed to solve a single 
such instance. 

We start with with a simple and standard reduction from the randomized protocol 
to the deterministic one and further to a large set of inputs that makes the first (and in 
this case only) message fixed. These steps are also used in the general round elimination 
argument therefore we state them in general form. 

Let e > be a small constant and let P he an 1/3-error randomized protocol for 
the exists-equal problem with parameters n and t = An. We repeat the protocol P in 
parallel taking the majority output, so that the number of rounds does not change, the 
length of the messages is multiplied by a constant and the error probability decreases 
below e. Now we fix the coins of of this e-error protocol in a way to make the resulting 
deterministic protocol err on at most e fraction of the possible inputs. Denote the 
deterministic protocol we obtain by Q. 

Lemma 2. Let Q he a deterministic protocol for the EE„ problem that makes at most 
e error on the uniform distribution. Assume Alice sends the first message of length c. 
There exists an S C [t]" of size fi{S) = 2~'^~^ such that the first message of Alice is 
fixed when x £ S and we have Pry^^[Q{x,y) / EE(x,y)] < 2e for all x £ S. 



Proof. Note that the quantity e{x) = Prj^^^[(5(x, y) ^ EE(x, y)], averaged over ah x, is 
the error probabihty of Q on the uniform input, hence is at most e. Therefore for at 
least half of x, we have e(x) < 2e. The first message of Alice partitions this half into at 
most 2^^ subsets. We pick S to consist of t"/2'^''"^ vectors of the same part: at least one 
part must have this many elements. D 

We fix a set S as guaranteed by the lemma. We assume we started with a single 
round protocol, so Q{x,y) = Q{x',y) whenever x,x' G S. Indeed, Alice sends the same 
message by the choice of S and then the output is determined by Bob, who has the 
same input in the two cases. 

We call a pair {x,y) bad ii x G S, y £ [t]" and Q errs on this input, i.e., Q{x,y) / 
EE(a;, y). Let b be the number of bad pairs. By Lemma 2 each x G \S\ is involved in at 
most 2et" bad pairs, so we have 

b<2e\S\r. 

We call a triple (x, x' , y) bad if x,x' € S, y £ [t]"-, EE(x, y) = 1 and EE{x',y) = 0. The 
proof is based on double counting the number z of bad triples. Note that for a bad 
triple {x,x',y) we have Q{x,y) = Q{x',y) but EE{x,y) ^ EE(x',y), so Q must err on 
either (x,y) or {x',y) making one of these pairs bad. Any pair (bad or not) is involved 
in at most |5| bad triples, so we have 

z < b\S\ < 2e\S\h"'. 

Let us fix arbitrary x,x' G S with Match(x,x') < n/2. We estimate the num- 
ber of y G [t]"' that makes {x,x',y) a bad triple. Such a y must have Match(a;,y) > 
Match(a;', y) = 0. To simplify the calculation we only count the vectors y with Match(x, y) 
1. The match between y and x can occur at any position i with Xi^ x\. After fixing 
the coordinate yi = Xi we can pick the remaining coordinates yj of y freely as long as 
we avoid Xj and x'-. Thus we have 

\{y I (x, x'y) is bad}| > (n - Match(x, y)){t - 2)"-^ > (n/2)(t - 2)"-^ > t"/14, 

where in the last inequality we used t = An. Let s be the size of the Hamming ball 
Bn/2{x) = {y ^ [i]" I Match(x,y) > n/2}. By the Chernoff bound we have s < f^/n"/^ 
(using t = An again). For a fixed x we have at least |5| — s choices for x' £ S with 
Match(x, x') < n/2 when the above bound for triples apply. Thus we have 

z>\S\{\S\-s)f'/U. 

Combining this with the lower bound on the number of bad triples we get 

28e\S\ > \S\ -s. 

Therefore we conclude that we either have large error e > 1/56 or else we have 
|5| < 2s < 2t"/n"/2. As we have |5| = t"/2'=+i the latter possibility imphes 

c > n log n/2 — 2. 

Summarizing we have the following. 

9 



Theorem 2. A single round probabilistic protocol for EE„ with error probability 1/3 
has complexity O(ralogTi). 

A single round deterministic protocol for EE„ that errs on at most 1/56 fraction of 
the inputs has complexity at least nlogn/2 — 2. 

4 An isoperimetric inequality on the discrete grid 

The isoperimetric problem on the Boolean cube {0, 1}*^ proved extremely useful in 
theoretical computer science. The problem is to determine the set S C {0, 1}" of a fixed 
cardinality with the smallest "perimeter", or more generally, to establish connection 
between the size of a set and the size of its boundary. Here the boundary can be defined 
in several ways. Considering the Boolean cube as a graph where vertices of Hamming 
distance 1 are connected, the edge boundary of a set S is defined as the set of edges 
connecting S and its complement, while the vertex boundary consists of the vertices 
outside S having a neighbor in S. 

Harper [20] showed that the vertex boundary of a Hamming ball is smallest among 
all sets of equal size, and the same holds for the edge boundary of a subcube. These 
results can be generalized to other cardinalities [22]; see the survey by Bezrukov [7]. 

Consider the metric space over the set [t]" endowed with the Hamming distance. 
Let / be a concave function on the nonnegative integers and 1 < M < n be an integer. 
We consider the following value as a generalized perimeter of a set 5 C [i]": 

E [f{\BMix)nS\)], 

where Bm{x) = {y € [t]" | Match(x, y) > M} is the radius n — M Hamming ball around 
X. Note that when M = n — 1 and / is the counting function given as /(O) = and 
f{l) = 1 for I > (which is concave), the above quantity is exactly the normalized 
size of the vertex boundary of S. For other concave functions / and parameters M 
this quantity can still be considered a measure of how "spread out" the set S is. We 
conjecture that n-dimensional boxes minimize this measure in every case. 

Conjecture 1. Let 1 < k < t and 1 < M < n be integers. Let S be an arbitrary subset 
of [f]" of size k"- and P = [/;;]". We have 

E [/(|i?M(x)nP|)]< E [f{\BM{x)r\S\)]. 

Even though a proof of Conjecture 1 remained elusive, in Theorem 3, we prove an 
approximate version of this result, where, for technical reasons, we have to restrict our 
attention to a small fraction of the coordinates. Having this weaker result allows us to 
prove our communication complexity lower bound in the next section but proving the 
conjecture here would simplify this proof. 

We start the technical part of this section by introducing the notation we will use. 
For x,y £ [t]" and i G [n] we write a; ~j y if Xj = yj for j £ [n] \ {i}. Observe that ~j is 
an equivalence relation. A set K CI [t]"- is called an i-ideal if x ~j y, Xi < yi and y G K 
implies x £ K. We call a set K CI [t]" an ideal if it is an i-ideal for all i £ [n]. 
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For i £ [n] and x £ [t]" we define downj(x) = (xi, . . . , Xj_i, Xj — 1, Xj+i, . . . , a:„). We 
have downj(x) G [t]" whenever Xj > 1. Let K C [t]" be a set, i G [n] and 2 < a G [t]. 
For X G K, we define downj^a(x, K) = downj(x) if Xj = a and downj(x) ^ K and we set 
downj^a(x,i^) = X otherwise. We further define downj^a(i^) = {downj^a(x, K) | x £ K}. 
For K C [i]" and i G [n] we define 

downi(K) = {y€ [t]" \ y, < \{z e K \ y r^, z}\}. 
Finally for K CI [t]" we define 

down(i^) = downi(down2(. . . down„(i^) ...))• 
The following lemma states few simple observations about these down operations. 
Lemma 3. Let K C [t]" he a set and let i,j £ [n] be integers. The following hold. 
(i) downj(i^) can be obtained from K by applying several operations dowuj^a. 
(a) I downj^a(i^)| = \K\ for each 2 < a <t, \ downj(i^)| = \K\ and \ down(i^)| = 1-/^!. 
(Hi) downj(i^) is an i-ideal and if K is a j-ideal, then downj(i^) is also a j-ideal. 

(iv) down(i^) is an ideal. For any x G down(i^) we have P := [xi] x [X2] x • • • x [x„] C 
down(i^) and there exists a set T <^ K with P = down(T) . 

Proof. For statement (i) notice that as long as K is not an i-ideal one of the operations 
downj^a will not fix K and hence will decrease X^^g^^ Xj. Thus a finite sequence of these 
operations will transform K into an i-ideal. It is easy to see that the operations dowuj^a 
preserve the number of elements in each equivalence class of ~j, thus the i-ideal we 
arrive at must indeed be downj(i^). 

Statement (ii) follows directly from the definitions of each of these down operations. 

The first claim of statement (iii), namely that downj(i^) is an i-ideal, is trivial from 
the definition. Now assume j ^ i and i^ is a j-ideal, y G downj(i^) and yj > 1. To 
see that downj(i^) is a j-ideal it is enough to prove that downj(y) G downj(i^). Since 
y G downj(i^), there are yi distinct vectors z £ K that satisfy z ^i y. Considering the 
vectors downj(z) ~j downj{y) and using that these distinct vectors are in the j-ideal K 
proves that downj(y) is indeed contained in downj(i^). 

By statement (iii), down(i^) is an i-ideal for each i G [n]. Therefore down(i^) is 
an ideal and the first part of statement (iv), that is, P C K' follows. We prove the 
existence of suitable T by induction on the dimension n. The base case n = (or even 
n = 1) is trivial. For the inductive step consider K' = down2(down3(. . . down„(i^) . . .)). 
As X G down(i^) = downi (i^T'), we have distinct vectors x^^' G K' for k = 1, . . . ,xi, 
satisfying x^ ' ~i x. Notice that the construction of K' from K is performed inde- 
pendently on each of the {n — l)-dimensional "hyperplanes" S*' = {y G [t]" \ yi = 1} 
as none of the operations down2 , . . . , down„ change the first coordinate of the vec- 

,,-. (k) 

tors. We apply the inductive hypothesis to obtain the sets T^- ' C 6*^1 n K such that 
down2(. . . down„(T'^'') . . .) = {x^ } x [X2] x ••• x [x„]. Using again that these sets 
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are in distinct hyperplanes and the operations down2, . . . ,down„ act separately on the 
hyperplanes S\ we get for T := U^^^T^''' that 

down2(. . . down„(r) . . . ) = {^f^ I ^ ^ [^^i]} x [X2] x • • • x [x„]. 

Applying downi on both sides finishes the proof of this last part of the lemma. D 

For sets x £ [t]", / C [n], and integer M £ [n] we define Bi^m{x) = {y £ [t]" | 
Match(x/,y/) > M}. The projection of Bj^m to the coordinates in I is the Hamming 
ball of radius \I\ — M around the projection of x. 

Lemma 4. Let I Q [n], M G [n] and let f be a concave function on the nonnegative 
integers. For arbitrary K C [t]" we have 

E [/(|B,,M(x)ndown(i^)|)]< E [f{\Bi^M{x)r\K\)]. 

Proof. By Lemma 3(i), the set down(/ir) can be obtained from i^ by a series of opera- 
tions downj^a with various i £ [n] and 2 < o < t. Therefore, it is enough to prove that 
the expectation in the lemma does not increase in any one step. Let us fix i S [n] and 
2<a<t. We write N^ = Bi^m{x) n K and A^^ = Bj^m{x) n downi^aiK) for x G [i]". 
We need to prove that 

E [f{\N,\)] > E [f{\K\)]. 

Xr^ll X^^ 

Note that \Nx\ = \N!^\ whenever i ^ I or Xi ^ {a, a — 1}. Thus, we can assume i £ I and 
concentrate on x G [i]" with Xi £ {a, a — 1}. It is enough to prove f{\Nx\) + fi\Ny\) > 
f {\N'x\) + f i\^y\) for any pair of vectors x,y £ [t]", satisfying Xi = a, and y = downj(a;). 

Let us fix such a pair x,y and set C = {z £ K \ downj^a(X) | Match(x/,z/) = 
M}. Observe that N^ = A'^ U C and A'^ n C = 0. Similarly, observe that A^^ = 
Ny U downj,a(C) and Ny D downj,a(C) = 0. Thus we have \N^\ = \Nx\ - \C\ and 
|Ar;| = \Ny\ + I downi,,(C7)| = \Ny\ + \C\. 

The inequality /(|A^x|) + /(|A^y|) > /(|A^x'l) + /(l^^l) follows now from the concavity 
of /, the inequalities lA^^I < |A^y| < jA'^j and the equality \Nx\ + \Ny\ = \N!,\ + \N^\. 
Here the first inequality follows from downj_a(A'^) C downj_a(Aj^), the second inequality 
and the equality comes from the observations of the previous paragraph. D 

Lemma 5. Let K C [t]" he arbitrary. There exists a vector x £ K having at least n/5 
coordinates that are greater than k := |/i(Ar)^'(^"'\ 

Proof. The number of vectors that have at most n/5 coordinates greater than k can be 
upper bounded as 



(„;)*"'^'*"'^='"(„;5)(^-/*)*"'^=i'^-ii7 



n \ 

n/5/ 

24n/5 ' 



where in the last step we have substituted j = ^/u(Ar)^/*^^"'^ and /x(K) = jA'l/t". Esti- 
mating (^/g) < 2"^^'^^^'^' , we obtain that the above quantity is less than \K\. Therefore, 
there must exists an x G AT that has at least n/5 coordinates greater than k. D 
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Theorem 3. Let S be an arbitrary subset of [t]". Let k = |;u(5')^/*^^") and M = 
nk/{20t). There exists a subset T C S of size /c"'^ and I C [n] of size n/5 such that, 
defining Nx = {x' £ T \ Match (x/,Xj) > M}, we have 

(i) Pr,^^[iVx. = 0]<5-^^ and 

(a) 'Exr^piWog\NxW > (n/5 — M) log A; — nlog/c/5*^, where we take logO = —1 to make 
the above expectation exist. 

Proof. By Lemma 3(ii), we have | down(5)| = |5|. By Lemma 5, there exists an x G 
down(S') having at least n/5 coordinates that are greater than k. Let / C [n] be a set of 
n/5 coordinates such that Xi > k for a fixed x S down(S'). By Lemma 3(iv), down(S') 
is an ideal and thus it contains the set P = Y\^ Pi, where Pi = [k] for i £ L and Pi = {1} 
for i ^ I. Also by Lemma 3(iv), there exists a T C S such that P = down(T). We fix 
such a set T. Clearly, \T\ = k^'^ . 

For a vector x G [t]", let h{x) be the number of coordinates i £ I such that Xi G [A:]. 
Note that ¥.x^^[h{x)] = 4M and h{x) has a binomial distribution. By the Chernoff 
bound we have Pra;^^[/i(a;) < M] < 5"''^. For x with h{x) > M we have \Bi^Mix)r\P\ > 
j^n/5-M ^ but for h{x) < M we have Bj^m{x) n P = 0. With the unusual convention 
log = — 1 we have 

E [log \Bi Mix) n P\] > Fr[h{x) > Af](n/5 - M) log k - Pr[/i(x) < M] 

> (n/5 - M) log fc - n log k/5^ 

We have down(T) = P and our unusual log is concave on the nonnegative integers, 
so Lemma 4 applies and proves statement (ii): 

E [log|7V,|]> E [\og\Bi^M{x)nP\] 

> (n/5 - M) log A; - n log A;/5^^ . 

To show statement (i), we apply Lemma 4 with the concave function / defined as 
/(O) = -1 and f{l) = for ah Z > 0. We obtain that 

Pr[iV, = 0] = - E [/(|iV..|)] 

<- E [f{\Bj,M{x)nP\)] 

= Pr [Bi^x) n P = 0] 

<5-^. 

This completes the proof. D 

5 Lower bound for multiple round protocols 

In this section we prove our main lower bound result: 
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Theorem 4. For any r < log* n, an r-round probabilistic protocol for EE„ with error 
probability at most 1/3 sends at least one message of size ^{nlog^^' n). 

Note that the r = 1 round case of this theorem was proved as Theorem 2 in Section 3. 
The other extreme, which immediately follows from Theorem 4, is the following. 

Corollary 1. Any probabilistic protocol for EEn with maximum message size 0{n) and 
error 1/3 has at least log* n — 0(1) rounds. 

Theorem 4 is a direct consequence of the corresponding statement on deterministic 
protocols with small distributional error on uniform distribution; see Theorem 5 at the 
end of this section. Indeed, we can decrease the error of a randomized protocol below 
any constant e > for the price of increasing the message length by a constant factor, 
then we can fix the coins of this low error protocol in a way that makes the resulting 
deterministic protocol Q err in at most e fraction of the possible inputs. Applying 
Theorem 5 to the protocol Q proves Theorem 4. 

In the rest of this section we use round-elimination to prove Theorem 5, that is, we 
will use Q to solve smaller instances of the exists-equal problem in a way that the first 
message is always the same, and hence can be eliminated. 

Suppose Alice sends the first message of c bits in Q. By Lemma 2, there exists a 
S C [t]" of size ^x{S) = 2~^~^ such that the first message of Alice is fixed when x & S 
and we have Pry^^[(5(x,y) ^ EE(x,y)] < 2e for all x £ S. Fix such a set S and let 

k := t/2^T^+^ and M := nk/{20t). By Theorem 3, there exists a T C S oi size fc"/^ 
and / C [n] of size n/5 such that defining 

N^ = {yeT\ Match(x/,y/) > M} 

we have Pr^^^[A^^ = 0] < 5"^^ and E^.^^[log \N^\] > (n/5 - M) log /c - n log k/5^'^ . Let 
us fix such sets T and /. Note also that Theorem 3 guarantees that T is a strict subset 
of S. Designate an arbitrary element of S* \ T as Xg. 

5.1 Embedding the smaller problem 

The players embed a smaller instance u,v £ [t']" of the exists-equal problem in EE„ 
concentrating on the coordinates / determined above. We set n' ■= M/10 and t' ■= An' . 
Optimally, the same embedding should guarantee low error probability for all pairs of 
inputs, but for technical reasons we need to know the number of coordinate agreements 
Match('u, v) for the input pairs {u, v) in the smaller problem having EE„/(ii, v) = 1. Let 
i? > 1 be this number, so we are interested in inputs u,v € [t']" with Match(u, v) = 
or R. We need this extra parameter so that we can eliminate a non-constant number 
of rounds and still keep the error bound a constant. For results on constant round 
protocols one can concentrate on the R = 1 case. 

In order to solve the exist-equal problem with parameters t' and n' Alice and Bob 
use the joint random source to turn their input u,v G [t']" into longer random vectors 
X',Y £ [f]", respectively, and apply the protocol Q above to solve this exists-equal 
problem for these larger inputs. Here we informally list the main requirements on 
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the process generating X' and Y . We require these properties for the random vectors 
X' ,Y G [t]" generated from a fixed pair u,v £ [t']" satisfying Match(n,f) = or R. 

(PI) EE{X',Y) = EE{u,v) with large probabihty, 

(P2) supp(X') = T U {x'J and 

(P3) for most x' ~ X' , we have dist(y | X' = x') is close to uniform distribution on 

Combining these properties with the fact that Pry,~^^[Q{x,y) / EE{x,y)] < 2e for 
each X £ S,we will argue that for the considered pairs of inputs Q{X' , Y) equals EE{u, v) 
with large probability, thus the combined protocol solves the small exists-equal instance 
with small error, at least for input pairs with Match(ii, w) = or i?. Furthermore, by 
Property (P2) the first message of Alice will be fixed and hence does not need to be 
sent, making the combined protocol one round shorter. 

The random variables X' and Y are constructed as follows. Let m ■= 2n/{MR) be an 
integer. Each player repeats his or her input (n and v, respectively) m times, obtaining 
a vector of size n/{f>R). Then using the shared randomness, the players pick n/{5R) 
uniform random maps rm : [t'] — )• [t] independently and apply rrii to ith coordinate. 
Furthermore, the players pick a uniform random 1-1 mapping vr : [n/(5R)] — t- / and use 
it to embed the coordinates of the vectors they constructed among the coordinates of 
the vectors X and Y of length n. The remaining n — n/{5R) coordinates of X is picked 
uniformly at random by Alice and similarly, the remaining n — n/{5R) coordinates of 
Y is picked uniformly at random by Bob. Note that the marginal distribution of both 
X and Y are uniform on [t]". If Match(M, v) = the vectors X and Y are independent, 
while if Match(n, v) = R, then Y can be obtained by selecting a random subset of / of 
cardinality mR, copying the corresponding coordinates of X and filling the rest of Y 
uniformly at random. 

This completes the description of the random process for Bob. However Alice gener- 
ates one more random variable X' as follows. Recall that N^ = {z £ T \ Match (z/, x/) > 
M}. The random variable X' is obtained by drawing x ~ X first and then choosing 
a uniform random element of N^. In the (unlikely) case that Nx = 0, Alice chooses 
X' = x',. 

Note that X' either equals Xg or takes values from T, hence Property (P2) holds. 
In the next lemma we quantify and prove Property (PI) as well. 

Lemma 6. Assume n > 3, M > 2 and u,v £ [t']" . We have 

(i) i/Match(u,w) = then Pr[EE(X',y) = 0] > 0.77; 

(ii) ifMaidi{u,v) = R, then Pr[EE(X',y) = 1] > 0.80. 

Proof. For the first claim, note that when Match(n, v) = 0, the random variables X 
and Y are independent and uniformly distributed. We construct X' based on X, so its 
value is also independent of Y. Hence Pr[EE(Ar',y) = 0] = (1 — l/t)""- This quantity 
goes to e~^' ^ since t = An and is larger than 0.77 when n > 3. This establishes the first 
claim. 
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For the second claim let J = {i G I | Xj = Yi} and K = {i £ I \ X'^ = Xi}. 
By construction, \J\ = Matcli(X/,17) > mR and \K\ = Match(X^,X/) > M unless 
Nx = 0- By our construction, each J C / of the same size is equally likely by symmetry, 
even when we condition on a fix value of X and X' . Thus we have E[| J n /C| | Nx 7^ 
0] > mRM/\I\ = 10 and Pr[J n iv: = I iVx / 0] < e^^o. Note that X is distributed 
uniformly over [t]", therefore by Theorem 3(i) the probability that Nx = is at most 
5-^. Note that Match(X',y) > \J D K\ and thus Pr[EE(X',y) = 0] < Pr[J DK = 
0] < Fv[JnK = I iV^ / 0] + Pr[iVx = 0] < 6"^° + 5'^'. This completes the proof. D 

We measure "closeness to uniformity" in Property (P3) by simply calculating the 
entropy. This entropy argument is postponed to the next subsection; here we show how 
such a bound to the entropy implies that the error introduced by Q is small. 

Lemma 7. Let x' €z S be fixed and let j be a probability in the range 2e < 7 < 1. // 
H(y I X' = x') > nlogt - D2(7 || 2e) then PTy^Y\X'=AQ{^' , v) + EE(x',y)] < 7. 

Proof. For a distribution u over [t]", let e{i') = Pvy,^^[Q{x' ,y) / EE(x',y)]. We prove 
the contrapositive of the statement of the lemma, that is assuming PiCyr^Y\X'=x' [Q{^' > v) ¥" 
EE(x',y)] > 7 we prove H(y | X' = x') < nlogt - 02(7 || 2e): 

nlogt - H(y I X' = x') = D(dist(y | X' = x') \\ /i) 

> D2(e(dist(y I X' = x')) \\ e{n)) 

>D2(7l|2e), 

where the first inequality follows from the chain rule for the Kullback-Leibler divergence. 

D 

5.2 Establishing Property (P3) 

We quantify Property (P3) using the conditional entropy H(y | X'). If Match(n, v) = R 
our process generetas X and Y with the expected number E[Match(X/, y/)] of matches 
only slightly more than the minimum mR. We lose most of these matches with Y when 
we replace X by X' and only an expected constant number remains. A constant number 
of forced matches with X' within / restricts the number of possible vectors Y but it only 
decreases the entropy by 0(1). The calculations in this subsection make this intuitive 
argument precise. 

Lemma 8. Let X',Y be as constructed above. The following hold. 

(i) //Match(n, v) = we have B{Y \ X') = nlogt. 

(a) Lf M > 100 log n and Match(u, v) = R we have H(y \X') = n log t - 0(1). 

Proof. Part (i) holds as Y is uniformly distributed and independent of X' whenever 
EE(ti,t;) = 0. 

For part (ii) recall that if Match(n, v) = R one can construct X and Y by uniformly 
selecting a size mR set L <^ I and selecting X and Y uniformly among all pairs satisfying 
Xl = Yl . Recall that L is the set of coordinates the mR matches between vT^ and v™ 
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were mapped. These are the "mtentional matches" between Xi and Y/. Note that there 
may be also "unintended matches" between Xi and Y/, but not too many: their expected 
number is (n/5 — mR)/t < 1/20. As given any fixed L, the marginal distribution of 
both X and Y are still uniform, so in particular X is independent of L and so is X' 
constructed from X. Therefore we have 

H(y I X') = H(y I X', L) + H(L) - H(L | Y, X'). 

We treat the terms separately. First we split the first term: 

H(y I X',L) = ^{Yl I X\ L) + H(y[„]\i I X\ L, Yl) 

and use that y[n]\L is uniformly distributed for any fixed L, X' and Yl, making 

ii{Y[n]\L \X',L, Yl) = {n- niR) logt. 

We have Xl = Yl, thus 

n{YL\X',L) = Yi{XL\X',L) 
mR 



>^H(X,|X') 
> mR log t — 10 log k 



MR 



5Af- 



j-logA;, 



where the first inequality follows by Lemma 1 as L is a uniform and independent of X 
and X' and the second inequality follows from Lemma 9 that we will prove shortly and 
the formula defining m. 

The next term, H(L) is easy to compute as L is a uniform subset of / of size mR: 



H(L) = log 



n/5 
mR 



It remains to bound the term H(L | Y,X'). Let Z = {i \ i ^ I and X[ = Yi\. Note 
that Z can be derived from X' , Y (as / is fixed) hence H(L | Y, X') < H(L | Z). Further, 

let C = \Z\L\. We obtain 



H(L I y, X') < H(L I Z) < H(L \Z,C)+ H(C) 

'n/5-\Z\+C 



< E 
z,c 



log 



mR-\Z\ +C 



+ E 
z,c 



log 



+ 2 



where we used H(C) < 2. Note that for any fixed x' £ T and x G supp(X | X' = x'), we 
have 

E[|Z| -C\X = x,X' = x']= Match{x I, x'j)mR/ {n/5) > 10 

as Match.{xj,x'j) > M by definition. 
Hence we have 



log 



n/5 
mi? 



fn/5-\Z\ + \C\\ , n ^, , 
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E 
z,c 



log ( '^' 



<E[|Zn < 20. 



Summing the estimates above for the various parts of H(y | X') the statement of the 
lemma follows. D 

It remains to prove the following simple lemma that "reverses" the conditional en- 
tropy bound in Theorem 3(ii): 

Lemma 9. For any u,v ^ [t']" we have H(X/ | X') > ^logt — MlogA; — nlogA;/5^. 

Proof. Using the fact that H(A, B) = R{A \ B) + H(B) = H(S | A) + B.{A) we get 

H(X/ I X') = H(X' I Xi) + H(Xj) - H(X') 
>-\ogt + YL{X'\Xi)--\ogk, 

where in the last step we used H(X') < log | supp(X')| = log \T\ = | log k and H(X/) = 
(n/5) log t as X is uniformly distributed. 

Observe that H(X' | Xi) = H(X' | X) = Ej.^^[log |A^a:|], where logO is now taken to 
be 0. From Theorem 3(ii) we get H(X' | X) > ^ logk — Mlogk — nlogk/5 finishing 
the proof of the lemma. D 

5.3 The round elimination lemma 

Let f„ be the uniform distribution on [t]" x [t]", where we set t = ^n. The following 
lemma gives the base case of the round elimination argument. 

Lemma 10. Any 0-round deterministic protocol for EEn has at least 0.22 distributional 
error on u^, when n > 1. 

Proof. The output of the protocol is decided by a single player, say Bob. For any given 
input y G [tf we have 3/4 < Pr^^^[EE(x,y) = 0] < e"^/^ < 0.78. Therefore the 
distributional error is at least 0.22 for any given y regardless of the output Bob chooses, 
thus the overall error is also at least 0.22. D 

Now we give our full round elimination lemma. 

Lemma 11. Let r > 0,c,n be an integers such that c < (n log n)/2. There is a constant 
< eo < 1/200 such that if there is an r-round deterministic protocol with c-bit messages 
for EE„ that has eo error on Vn, then there is an (r — l)-round deterministic protocol 

be 

with 0{c)-bit messages for EE„/ that has eo error on Un', where n' = $7(n/24^). 

Proof. We start with an intuitive description of our reduction. Let us be given the 
deterministic protocol Q for EE„ that errs on an eo fraction of the inputs. To solve an 
instance {u, v) of the smaller EE„/ problem the players perform the embedding procedure 
described in previous subsection ko times independently for each parameter R E [Rq]- 
Here /co and Rq are constants we set later. They perform the protocol Q in parallel for 
each of the fcoi?o pairs of inputs they generated. Then they take the majority of the 
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ko outputs for a fixed parameter R. We show that this result gives the correct value of 
EE(it, v) with large probability provided that Match(n, v) = or R. Finally they take 
the OR of these results for the Rq possible values of R. By the union bound this gives 
the correct value EE(ti, v) with large probability provided Match(n, v) < Rq. Fixing the 
random choices of the reduction we obtain a deterministic protocol. The probability 
of error for the uniform random input can only grow by the small probability that 
Match(u,t;) > Rq and we make sure it remains below eo- The rest of the proof makes 
this argument precise. 

For random variables X' and Y constructed in Section 5.1, Lemma 8 guarantees 
that H(y I X') > nlogt — oq for some constant oq, as long as M > 100 log n and 
Match(u, v) = R. Let eo be a constant such that D2(l/10 || 2eo) > 200(ao + !)• Note 
that such eo can be found as D2(l/10||e) tends to infinity as e goes to 0. We can 
bound Pr(j,.y)^^^[Match(x,y) > /] < 1/(4'/!) for aU m > I. We set Rq such that 
Pr(^,y)^,,„[Match(x,y) > Rq] < eo/2 for all m > 1. 

Let Q be a deterministic protocol for EE„ that sends c < (nlogn)/2 in each round 
and that has eo error on I'n- Let S be as constructed in Lemma 2 and let M be as 
defined in Theorem 3. We have M = ^2 ^ as t = 4n and fj,{S) = 2~('=+^) by 
Lemma 2. Note that by our choice of c, we have M > 100 log n, hence the hypotheses 
of Lemma 8 are satisfied. 

Let n' = M/10 = ^2 ^ . Now we give a randomized protocol Q' for EE„/. 
Suppose the players are given an instance of EE„/, namely the vectors {u, v) £ [4n']" x 
[An']"' . Let /cq = 101og(i?o + 1/eo)- For R G [Rq] and k G [fco], the players construct 
the vectors X'^^^ and Yji^k ^.s described in Section 5.1 with parameter R and with fresh 
randomness for each of the -Rq/co procedures. The players run -Rq/co instances of protocol 
Q in parallel, on inputs ^^ jr., Y/j^fc for R E [Rq] and k G [ko]. Note that the first message 
of the first player, Alice, is fixed for all instances of Q by Property (P2) and Lemma 2. 
Therefore, the second player. Bob, can start the protocol assuming Alice has sent the 
fixed first message. After the protocols finish, for each R G [Rq], the last player who 
received a message computes hn as the majority of Q[X'^j^,Y^^i^) for k G [fco]- Finally, 
this player outputs if 6/j = for all R G [-Ro] aiid outputs 1 otherwise. 

Suppose now that EE('u, i;) = 0. By Lemma 6 (i), we have Pr[EE(X^^,y/jfc) = 
0] > 0.77 for each R and k. Recall that that Yr jt is distributed uniformly for each 
R and k and since EE(u, t;) = 0, it is independent of X'pi^^. Therefore, by X'^j^ G S 
(Property (P2)) and the fact that Prj^^^[(5(x, y) ^ 'EE{x,y)] < 2eo for all x G S" as per 
Lemma 2, we obtain Pr[Q(X^;., Yr^a;) = 0] > 0.77 -2eo > 0.76. By the Chernoff bound 
we have Pr[6ij = 1] < eo/(2-Ro)> and by the union bound Fr:[Q' outputs 0] > 1 — eo/2. 

Let us now consider the case Match(u, ii) = R for some R G [Rq]- Fix any k G [ko] 
and set X' = X'j^^., Y = Y^^k- By Lemma 6 (ii), Pr[EE(X',y) = 1] > 0.80. By 
Lemma 8, R(Y | X') >nlogt- uq, and so we have E^j/^x' [H(Y) - H(y | X' = x')] < uq. 
Let Z = {x' [ H(y) - H(Y|X' = x') > lOao}- Note that Y is uniform, and has 
full entropy, therefore H(Y) — H(y | X' = x') > 0. Using Markov's inequality we have 
Pr[X' € Z] < 1/10. When X' £ Z we cannot effectively bound the probability that 
EE{u,v) / Q{X',Y); namely, we bound this probability by 1. But if X' ^ Z, then by 
Lemma 7 and our choice of eo, we have Pr[EE(X', Y) ^ Q{X', Y)] < 1/10. Furthermore, 
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by Lemma 6(ii), Fr[EE{u,v) / EE{X',Y)] < 0.20 hence with probabihty at least 0.60 
we have EE(u, v) = Q{X', Y). This happens independently for all the values of A; G [ko], 
so by the Chernoff bound and our choice of k^, we have Pr[Q' outputs 0] < Pr[6/j = 

0] < eo/2. 

Finally, Pr(„j,)^j^ , [Match(n, f) > Rq] < eo/2 by our choice of Rq. Note that the 
protocol Q' uses a shared random bit string, say W, in the construction of the vectors 
X'j^j^ and Yji^k- Hence, overall, we have 

Pr [EE(u, v) = Q'(u, v)]>l- eo 

Since we measure the error of the protocol under a distribution, we can fix 1^ to a value 
without increasing the error under the aforementioned distribution by the so called easy 
direction of Yao's lemma. Namely, there exists a w G supp(VF) such that 

Pr [EE(m, v) = Q'{u, v)\W = w]>l-eo 

-5(c+l) 

Fix such tt;. Observe that Q' is a (r—l)-round protocol for EE„/ where n =^2 *« = 
Q{n/2i^) and it sends at most Rokoc = 0{c) bits in each message. Furthermore, Q' is 
deterministic and has at most eo error on z/„/ as desired. D 

Theorem 5. There exists a constant eo such that for any r < log* n, an r-round 
deterministic protocol for EE„ which has eo error on Vn sends at least one message of 
size Q.{n\og^'^' n). 

Proof. Suppose we have an r-round protocol with c-bit messages for EE„ that has eo 
error on f„, where c = 7nlog'^'-' n for some 7 < 4/5 — o(l). By Lemma 11, this protocol 
can be converted to an r — 1 round protocol with ac-bit messages for EE„/ that has 
eo-error on f„/, where n' = /?re/2^'^'^" for some a,/? > 0. We only need to verify that 
ac < 771' log'*"^ •' n'. We have 

7n'log(''^i) n' = 7/3n/25'=/^'^log(''-^)(/3n/25'=/^'^) 

= 7/3n/2^ ^"g*"' " log^''-^) (/3n/2^^/^") 

> 7/3n nog(^-i) n) * 

> 7anlog^''^ n 

for 7 < 4/5 — 0(1) and large enough n. Therefore, by iteratively applying Lemma 11 
we obtain a 0-round protocol for EEn that makes eo error on I'n for some n satisfying 
777,2 = 7nlog^ ' n > ccf . Therefore n > 1 and since eo < 0.22, the protocol we obtain 
contradicts Lemma 10, showing that the protocol we started with cannot exists. D 

Remark. We note that in the proof of Theorem 4, to show that a protocol with small 
communication does not exist, we start with the given protocol and apply the round 
elimination lemma (i.e.. Lemma 11) r times to obtain a 0-round protocol with small 
error probability, which is shown to be impossible by Lemma 10. Alternatively, one can 
apply the round elimination r — 1 times to obtain a 1-round protocol with o(n log n) 
communication for EE„, which is ruled out by Theorem 2. 
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6 Discussion 

The r-round protocol we gave in Section 2 solves the sparse set disjointness problem 
in 0{k log^*^^ k) total communication. As we proved in Section 5 this is optimal. The 
same, however, cannot be said of the error probability. With the same protocol, but 
with more careful setting of the parameters the exponentially small error 0(2~^'^) of 
the log* A;-round protocol can be further decreased to 2~'' ° . 

For small (say, constant) values of r this protocol cannot achieve exponentially small 
error error without the increase in the complexity if the universe size m is unbounded. 
But if m is polynomial in k (or even slightly larger, m = exp^'~)(0(log^^^ k))), we can 
replace the last round of the protocol by one player deterministically sending his or her 
entire "current set" Sr- With careful setting of the parameters in other rounds, this 
modified protocol has the same 0{k\og^^' k) complexity but the error is now exponen- 
tially small: 0(2^'^'^°^'^). Note that in our lower bound on the r-round complexity of 
the sparse set disjointness we we use the exists-equal problem with parameters n = k 
and t = 4:k. This corresponds to the universe size m = tn = 4A;^. In this case any 
protocol solving the exists-equal problem with 1/3 error can be strengthened to expo- 
nentially small error using the same number of rounds and only a constant factor more 
communication. 

Our lower and upper bounds match for the exists-equal problem with parameters 
n and t = Q,{n), since the upper bounds were established without any regard of the 
universe size, while the lower bounds worked for t = An. Extensions of the techniques 
presented in this paper give matching bounds also in the case 3 < t < n, where the 
r-round complexity is 0(nlog*^''^ t) for r < log* t. Note, however, that in this case one 
needs to consider significantly more complicated input distributions and a more refined 
isoperimetric inequality, that does not permit arbitrary mismatches. The Q(n) lower 
bound applies for the exists-equal problem of parameters n and t > 3 regardless of the 
number of rounds, as the disjointness problem on a universe of size n is a sub-problem. 
For t = 2 the situation is drastically different, the exists-equal problem with t = 2 is 
equivalent to a single equality problem. 

Finally a remark on using the joint random source model of randomized protocols 
throughout the paper. By a result of Newman [37] our protocols of Section 2 can be 
made to work in private coin model (or even if one of the players is forced to behave 
deterministically) by increasing the first message length by 0(loglog(A^) -|- log(l/e)) 
bits, where A^ = (™) is the number of possible inputs. In our case this means adding 
the term O(loglogrra) -|- o{k) to our bound of 0(A;log'-'^^ k), since our protocols make 
at least exp(— A;/log A;) error. This additional cost is insignificant for reasonably small 
values of m, but it is necessary for large values as the equality problem, which is an 
instance of disjointness, requires ri(loglogm)-bits in the private coin model. 

Note also that we achieve a super-linear increase in the communication for OR of 
n instances of equality even in the private coin model for r = 1. For r > 2, no such 
increase happens in the private coin model as communication complexity of EE^ is at 
most O(nloglogt) however a single equality problem requires r2(loglogt) bits. 
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